Major Enhancements in Feature Detection and JSON Output Management #9

jorgeaduran · 2024-02-17T07:51:43Z

Major Enhancements in Feature Detection and JSON Output Management

Description

This PR introduces a series of comprehensive updates aimed at improving the efficiency, accuracy, and user control over feature detection and JSON output generation within our project. The changes span across various components, refining both the underlying logic for string extraction and the mechanisms for data representation. Below is a summary of the key enhancements:

Feature Detection Improvements

Optimized Unicode String Extraction: We've refined the extract_unicode_strings function to better handle UTF-16LE and UTF-16BE encodings, employing targeted regex patterns that enhance the accuracy of our string detection efforts.
Advanced Bytes Feature Evaluation: The evaluation method in BytesFeature now utilizes a sliding window approach, allowing us to detect specified byte sequences more flexibly across different contexts.

JSON Output Management

Enhanced JSON Generation for Map Features: With the new -f parameter, users can now filter map features by type, making the JSON output more relevant and manageable. This feature is triggered by the -m flag and requires specifying an output path using the -o parameter.
Clean String Function: We've added a function to sanitize extracted strings, ensuring the output is free from null characters and non-printable ASCII characters.

Safety and Usability Enhancements

Boundary Checks and Error Handling: Significant updates have been made to prevent buffer over-reads and integer overflows, particularly in the detect_ascii_len function, enhancing the overall safety of our operations.
CLI Options Expansion: The introduction of filter_map_features in CliOpts allows for even finer control over the features to be processed.

Why This Matters

These updates collectively represent a significant leap forward in our project's capability to accurately detect and represent data features, catering to a broader range of encoding scenarios and user needs. By improving efficiency, accuracy, and control, we are setting a solid foundation for future developments and applications of our project.

Testing

Test cases cover a variety of scenarios, including different encoding formats, feature types, and JSON output configurations.

I look forward to your feedback and any further suggestions for improvement!

- Optimized extract_unicode_strings to improve efficiency in UTF-16LE and UTF-16BE string extraction, using refined regex patterns for more accurate detection. - Introduced filter_map_features option in CliOpts for enhanced feature filtering capabilities. - Added BufferOverFlowError to handle buffer overflow conditions more effectively. - Implemented get_name method for Feature, facilitating JSON dumping of features. - Refined JSON generation for map_features with -f parameter for feature type filtering and -m flag for triggering JSON dump, requiring -o parameter for output path specification. - Streamlined read_string function for more efficient ASCII and Unicode string extraction, utilizing read_bytes output and ensuring exact chunking for UTF-16 conversion. - Enhanced detect_ascii_len function with boundary checks and checked arithmetic, preventing over-reads and integer overflow, thus ensuring accurate ASCII length detection. - Modified BytesFeature's evaluate method to use a sliding window for pattern detection, enhancing feature detection across various contexts. - Implemented logic to filter out empty features before insertion, ensuring meaningful and relevant data processing. - Noted that JSON generation now occurs only when -o parameter is specified, with -f parameter available for filtering, emphasizing purposeful and customizable data output. These cumulative updates significantly improve the project's robustness, accuracy, and user control over feature detection and data representation.

jorgeaduran · 2024-02-17T07:55:02Z

PD: For it to compile it is necessary to accept the smda PR

marirs · 2024-02-17T08:22:18Z

Hey thanks again..

1 Quick Question

This which I refactored:

if let Some(Yaml::String(s)) = rule.meta.get(&Yaml::String("namespace".to_string())) {
                self.capability_namespaces.insert(rule.name.clone(), s.clone());
                let first_non_zero_address = caps
                    .iter()
                    .find(|&&(addr, _)| addr != 0)
                    .map(|&(addr, _)| addr)
                    .unwrap_or(0);

                let _ = self
                    .capabilities_associations
                    .entry(rule.name.clone())
                    .or_insert_with(|| CapabilityAssociation {
                        attack: local_attacks_set.clone(),
                        mbc: local_mbc_set.clone(),
                        namespace: s.clone(),
                        name: rule.name.clone(),
                        address: first_non_zero_address as usize,
                    });
            }

and this (which you reverted):

if let Some(namespace) = rule.meta.get(&Yaml::String("namespace".to_string())) {
                if let Yaml::String(s) = namespace {
                    self.capability_namespaces
                        .insert(rule.name.clone(), s.clone());
                    let first_non_zero_address = caps
                        .iter()
                        .find(|&&(addr, _)| addr != 0)
                        .map(|&(addr, _)| addr)
                        .unwrap_or(0);

                    let _ = self
                        .capabilities_associations
                        .entry(rule.name.clone())
                        .or_insert_with(|| CapabilityAssociation {
                            attack: local_attacks_set.clone(),
                            mbc: local_mbc_set.clone(),
                            namespace: s.clone(),
                            name: rule.name.clone(),
                            address: first_non_zero_address as usize,
                        });
                }
            }

are the same. Any reason why you reverted back?

jorgeaduran · 2024-02-17T08:36:34Z

Thank you for highlighting this. After a closer look, it seems the modifications indeed revolve around formatting rather than functional changes. It's possible that I worked on a version of the code that didn't include your recent changes, leading to the unintended omission of your refinements. I apologize for any confusion this may have caused and will make sure to synchronize changes more carefully in the future to avoid such oversights.

marirs · 2024-02-17T08:38:14Z

Nah - dont worry about that.. I'll make the update now!
Just thought maybe it broke something.
Thanks again so much for your awesome pushes :)

marirs · 2024-02-17T08:39:59Z

Done I've pushed this as well :)

Thanks

jorgeaduran · 2024-02-17T08:52:20Z

Thank you for your understanding and for handling the update. I'm glad it didn't cause any issues. I appreciate the collaboration and look forward to our continued work together. Thanks again! :)

jorgeaduran added 2 commits February 17, 2024 08:57

- Updated Cargo.toml

2b5f678

- Updated Cargo.toml

dc46bb8

marirs merged commit b6bb90c into marirs:master Feb 17, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major Enhancements in Feature Detection and JSON Output Management #9

Major Enhancements in Feature Detection and JSON Output Management #9

jorgeaduran commented Feb 17, 2024 •

edited

Loading

jorgeaduran commented Feb 17, 2024

marirs commented Feb 17, 2024 •

edited

Loading

jorgeaduran commented Feb 17, 2024

marirs commented Feb 17, 2024

marirs commented Feb 17, 2024

jorgeaduran commented Feb 17, 2024

Major Enhancements in Feature Detection and JSON Output Management #9

Major Enhancements in Feature Detection and JSON Output Management #9

Conversation

jorgeaduran commented Feb 17, 2024 • edited Loading